# Replication Materials for "Algorithmic and human prediction of success in human collaboration from visual features."

This repository contains the code and data needed to replicate the results 
reported in "Saveski, M., Awad, E., Rahwan, I., Cebrian, M. Algorithmic and 
human prediction of success in human collaboration from visual features. 
Sci Rep 11, 2756 (2021)."


## Code
The code was written in R 3.4 and Python 2.7.

1. `code/analysis`
    - `study_1_preprocessing.R` and `study_2_preprocessing.R` contain code that 
        takes the raw survey data and transforms it to a format suitable for 
        further analyses. 
    - The remaining scripts take as input main data files 
        (`features_labels_narrow.csv` and `features_labels_wide.csv`, see below) 
        or the processed survey data and generate the corresponding figures.

2. `code/data_collection`
    - `get_photos.py` uses the Facebook API (v3.2) to fetch the albums and 
        photos from the Facebook group associated with the four escape the room 
        locations and download the photos in the highest resolution available.

3. `code/data_processing`
    - `face_annotations.py` uses the Face++ API to detect and characterize the 
        characteristics of the group members.
    - `facepp.py` contains code to interface with the Face++ API (borrowed).
    - `ocr_annotations.py` sample code for using Google OCR API (borrowed).
    - `blur_photos.py` uses the `skimage` library to apply Gaussian filters to 
        blur the images, everywhere except for the group members' faces. The 
        blurred images were used in the survey experiments.
    - `make_features.py` takes the Face++ outputs and builds the feature matrix.

4. `code/machine_predictions`
    - `test_classifiers.py` test the performance of several off-the-shelf 
        machine algorithms using nested 10-fold cross-validation (Fig 3).
    - `print_classifier_results.py` has a function that outputs the results in 
        a form suitable for further processing and a function for testing the 
        statistical significance between the different methods.
    - `make_test_set_preds.py` sets aside the test images used in the surveys, 
        trains various classifiers on all the other data, and make predictions 
        on the test set (results in Fig 7a).


## Data

1. `data/features_labels_narrow.csv` contains the aggregate characteristics of 
    groups in the photos (each for is a different group/photo). The fields 
    are explained in the paper (Section "Characteristics of successful groups")

2. `data/features_labels_wide.csv`, same as above, but the categorical 
    variables are also encoded using one-hot encoding.

3. `data/study_1/`
    - `responses_per_image.csv` contains the predictions (`Esc`) by 
        the survey respondents (`TurkerID`, hashed) in the two conditions 
        (`Condition`) for each of the selected images (`Img`). See Fig 5a.
    - `responses_factors.csv` the factors reported as most useful by the 
        survey participants (see Fig 6).

4. `data/study_2/`
    - `responses.csv` similar to `study_1/responses_per_image.csv` contains 
        the per image survey responses for the participants in the four 
        conditions (Fig 5b).
    - `machine_predictions_[xxx].csv` contains the predictions by each of the 
        classifiers on the set of images used in the survey. 
        Generated using `code/machine_predictions/make_test_set_preds.py`
    - `photo_ids.txt` contains the ids of the photos used in the survey.


## Data Collection
All group photos used in this work are publicly available, posted on public 
Facebook pages. Nevertheless, to preserve the privacy of those in the photos, 
we do not release the raw images. We do provide code to find and download them 
if they are still publicly available (`code/data_collection/get_photos.py`).

We use the following API endpoint to retrieve all albums of each of the
Escape the Room Facebook group:
https://developers.facebook.com/docs/graph-api/reference/v3.2/group/albums
and the following endpoint to retrieve all photos from each album:
https://developers.facebook.com/docs/graph-api/reference/v3.2/album/photos

The group IDs of the four groups we studied are:
1. New York City: 467048970077836
2. Boston: 772881386093091
3. Texas: 589993821132754
4. Arizona: 310631999138438

